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Abstract 

Tensors play a central role in many modern machine learning and signal processing appli¬ 
cations. In such applications, the target tensor is usually of low rank, i.e., can be expressed 
as a sum of a small number of rank one tensors. This motivates us to consider the problem 
of low rank tensor recovery from a class of linear measurements called separable measurements. 
As specific examples, we focus on two distinct types of separable measurement mechanisms (a) 
Random projections, where each measurement corresponds to an inner product of the tensor 
with a suitable random tensor, and (b) the completion problem where measurements consti¬ 
tute revelation of a random set of entries. We present a computationally efficient algorithm, 
with rigorous and order-optimal sample complexity results (upto logarithmic factors) for tensor 
recovery. Our method is based on reduction to matrix completion sub-problems and adapta¬ 
tion of Leurgans’ method for tensor decomposition. We extend the methodology and sample 
complexity results to higher order tensors, and experimentally validate our theoretical results 


1 Introduction 

Tensors provide compact representations for multi-dimensional, multi-perspective data in many 
problem domains, including image and video processing [50l !33, 25j, collaborative filtering j26, 16] . 
statistical modeling mm, array signal processing M HU, psychometrics |48L |42| . neuroscience 
El El, and large-scale data analysis pun m m m hbj. In this paper we consider the problem 
of tensor recovery - given partial information of a tensor via linear measurements, one wishes to 
learn the entire tensor. While this inverse problem is ill-posed in general, we will focus on the 
setting where the underlying tensor is simple. The notion of simplicity that we adopt is based on 
the (Kruskal) rank of the tensor, which much like the matrix rank is of fundamental importance - 
tensors of lower rank have fewer constituent components and are hence simple. For example, video 
sequences are naturally modeled as tensors, and these third order tensors have low rank as a result 
of homogeneous variations in the scene m- Unlike the matrix case, however, computational tasks 
related to the tensor rank such as spectral decompositions, rank computation, and regularization 
are fraught with computational intractability eh [28] in the worst case. 
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We focus on linear inverse problems involving tensors. Linear measurements of an unknown 
tensor X are specified by y = C(X) where £ is a linear operator and y 6 M m . Here the quantity 
m refers to the number of measurements, and the minimum number of measurements [^] rn required 
to reliably recover X (called the sample complexity) is of interest. While in general, such problems 
are ill-posed and unsolvable when m is smaller than the dimensionality of X, the situation is more 
interesting when the underlying signal (tensor) is structured, and the sensing mechanism £(•) is 
able to exploit this structure. For instance, similar ill-posed problems are solvable, even if m is 
substantially lower than the ambient dimension, when the underlying signal is a sparse vector, or 
a low-rank matrix, provided that £(•) has appropriate properties. 

We focus for the most part on tensors of order 3, and show later that all our results extend to the 
higher order case in a straightforward way. We introduce a class of measurement operators known 
as separable measurements, and present an algorithm for low-rank tensor recovery for the same. 
We focus on two specific measurement mechanisms that are special cases of separable mechanisms: 

• Separable random projections: For tensors of order 3, we consider observations where the i th 
measurement is of the form Ci{X) := {a <g> Ai, X ), where a is a random unit vector, Ai is a 
random matrix, and (g> represents an outer product of the two. For higher order tensors, the 
measurements are defined in an analogous manner. Here (•, •) is the tensor inner product (to 
be made clear in the sequel). 

• Completion: The measurements here are simply a subset of the entries of the true tensor. 
The entries need to be restricted to merely four slices of the tensor, and can be random within 
these slices. 

For both the random projection and completion settings, we analyze the performance of our algo¬ 
rithm and prove sample complexity bounds. 

The random sampling mechanisms mentioned above are of relevance in practical applications. 
For instance, the Gaussian random projection mechanism described above is a natural candidate 
for compressively sampling video and multi-dimensional imaging data. For applications where such 
data is “simple” (in the sense of low rank), the Gaussian sensing mechanism may be a natural 
means of compressive encoding. 

The completion framework is especially relevant to machine learning applications. For instance, 
it is useful in the context of multi-task learning [JU], where each individual of a collection of inter¬ 
related tasks corresponds to matrix completion. Consider the tasks of predicting ratings assigned 
by users for different clothing items, this is naturally modeled as a matrix completion problem [12] . 
Similarly, the task of predicting ratings assigned by the same set of users to accessories is another 
matrix completion problem. The multi-task of jointly predicting the ratings assigned by the users 
to baskets of items consisting of both clothing items and accessories is a tensor completion problem. 

Another application of tensor completion is that of extending the matrix completion framework 
for contextual recommendation systems. In such a setup, one is given a rating matrix that is indexed 
by users and items, and the entries correspond to the ratings given by different users to different 
items. Each user provides ratings for only a fraction of the items (these constitute the sensing 
operator £ (•)), and one wishes to infer the ratings for all the others. Assuming that such a rating 
matrix is low rank is equivalent to assuming the presence of a small number of latent variables that 
drive the rating process. An interesting twist to this setup which requires a tensor based approach 

4 We use the terms measurements and samples interchangeably. 
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is contextual recommendation - i.e. where different users provide ratings in different contexts (e.g., 
location, time, activity). Such a setting is naturally modeled via tensors; the three modes of the 
tensor are indexed by users, items, and contexts. The underlying tensor may be assumed to be low 
rank to model the small number of latent variables that influence the rating process. In this setting, 
our approach would need a few samples about users making decisions in two different contexts (this 
corresponds to two slices of the tensor along the third mode), and enough information about two 
different users providing ratings in a variety of different contexts (these are two slices along the 
first mode). Once the completion problem restricted to these slices is solved, one can complete the 
entire tensor by performing simple linear algebraic manipulations. 

Of particular note concerning our algorithm and the performance guarantees are the following: 

• Sample complexity: In the absence of noise, our algorithm, named T-ReCs (Tensor 

Recovery via Contractions), provably and exactly recovers the true tensor and achieves an 
order-optimal sample complexity for exact recovery of the underlying tensor in the context 
of random sensing, and order optimal modulo logarithmic factors in the context of tensor 
completion. Specifically, for a third order tensor of rank r and largest dimension n, the 



• Factorization: Equally important is the fact that our method recovers a minimal rank fac¬ 
torization in addition to the unknown tensor. This is of importance in applications such as 
dimension reduction and also latent variable models [2j involving tensors where the factoriza¬ 
tion itself holds meaningful interpretational value. 

• Absence of strong assumptions: Unlike some prior art, our analysis relies only on rela¬ 
tively weak assumptions - namely that the rank of the tensor be smaller than the (smallest) di¬ 
mension, that the factors in the rank decomposition be linearly independent, non-degenerate, 
and (for the case of completion) other standard assumptions such as incoherence between 
the factors and the sampling operator. We do not, for instance, require orthogonality-type 
assumptions of the said factors, as is the case in mm- 

• Computational efficiency: Computationally, our algorithm essentially reduces to linear 
algebraic operations and the solution of matrix nuclear norm (convex) optimization sub¬ 
problems, and is hence extremely tractable. Furthermore, our nuclear norm minimization 
methods deal with matrices that are potentially much smaller, up to factors of n, than com¬ 
peting methods that “matricize” the tensor via unfolding [351 H6 ]. In addition to recovering 
the true underlying tensor, it also produces its unique rank decomposition. 

• Simplicity: Our algorithm is conceptually simple - both to implement as well as to analyze. 
Indeed the algorithm and its analysis follow in a transparent manner from Leurgans’ algorithm 
(a simple linear algebraic approach for tensor decomposition) and standard results for low- 
rank matrix recovery and completion. We find this intriguing, especially considering the 
“hardness” of most tensor problems [211 EH!- Recent work in the area of tensor learning 
has focused on novel regularization schemes and algorithms for learning low rank tensors; 
the proposed approach potentially obviates the need for developing these in the context of 
separable measurements. 
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The fundamental insight in this work is that while solving the tensor recovery problem directly 
may seem challenging (for example we do not know of natural tractable extensions of the “nuclear 
norm” for tensors), very important information is encoded in a two-dimensional matrix “sketch” 
of the tensor which we call a contraction. (This idea seems to first appear in |3lj , and is expanded 
upon in [a uni in the context of tensor decomposition.) These sketches are formed by taking linear 
combinations of two-dimensional slices of the underlying tensor - indeed the slices themselves may 
be viewed as “extremal” contractions. For the Gaussian random projections case, the contractions 
will be random linear combinations of slices, whereas for the completion setting the contractions we 
work with will be the slices themselves, randomly subsampled. Our method focuses on recovering 
these contractions efficiently (using matrix nuclear norm regularization) as a first step, followed by 
additional processing to recover the true tensor. 

1.1 Related Work and Key Differences 

With a view to computational tractability, the notion of Tucker rank of a tensor has been explored; 
this involves matricizations along different modes of the tensor and the ranks of the associated 
matrices. Based on the idea of Tucker rank, Tomioka et al. [56] have proposed and analyzed 
a nuclear norm heuristic for tensor completion, thereby bringing tools from matrix completion 
[12] to bear for the tensor case. Mu et al. m , have extended this idea further by studying 
reshaped versions of tensor matricizations. However, to date, the sample complexity associated 
to matrix-based regularization seem to be orders far from the anticipated sample complexity (for 
example based on a count of the degrees of freedom in the problem) [35]. In this paper we resolve 
this conundrum by providing an efficient algorithm that provably enjoys order optimal sample 
complexity in the order, dimension, and rank of the tensor. 

In contrast to the matricization approach, alternative approaches for tensor completion with 
provable guarantees have appeared in the literature. In the restricted setting when the tensor 
has a symmetric factorization j5j (in contrast we are able to work in the general non-symmetric 
setting), the authors propose employing the Lasserre hierarchy via a semidefinite programming 
based approach. Unfortunately, the method proposed in [5] is not scalable - it requires solving 
optimization problems at the 6 th level of the Lasserre hierarchy which makes solving even moderate¬ 
sized problems numerically impractical as the resulting semidefinite programs grow rapidly with 
the dimension. Furthermore, the guarantees provided in [5j are of a different flavor - they provide 
error bounds in the noisy setting, whereas we provide exact recovery results in the noiseless setting. 
Alternate methods based on thresholding in the noisy setting have also been studied in [4j. An 
alternating minimization approach for tensor completion was proposed in [[23]. Their approach 
relies on the restrictive assumptions also - that the underlying tensor be symmetric and orthogonally 
decomposable (we make no such assumptions), and neither do the sample complexity bounds scale 
optimally with the dimensions or the rank. Unlike alternating minimization schemes that are 
efficient but rely on careful initializations, our method directly solves convex optimization programs 
followed by linear algebraic manipulations. Also relevant is [49], where the authors propose solving 
tensor completion using the tensor nuclear norm regularizer; this approach is not known to be 
computationally tractable (no polynomial time algorithm is known for minimizing the tensor nuclear 
norm) and the guarantees they obtain do not scale optimally with the dimension and rank. Finally a 
method based on the tubal rank and t-SVD of a tensor m has also recently been proposed, however 
the sample complexity does not scale optimally. As a final point of contrast to the aforementioned 
work, our method is also conceptually very simple - both to implement and analyze. 
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In Ta ble [ITT] we provide a brief comparison of the relevant approaches, their sample complexities 
in both the third order and higher order settings as well as a few key features of each approach. 


Reference 

Sample Complexity 
( 3 rd order) 

Sample Complexity 
(Lfth order) 

Key Features 

m 

0(rn 2 ) 

0(rn K ~ l ) 

Tucker rank, tensor unfold¬ 
ing 

m 

0(rn 2 ) 

0(rul ) 

Tucker rank, tensor unfold¬ 
ing 

m 

0 (r 5 n 2 log 5 n) 


Kruskal rank, alternat¬ 
ing minimization, orthog¬ 
onally decomposable ten¬ 
sors, symmetric setting, 
completion only. 

m 

0(rn 2 logn) 

- 

Tensor tubal rank, comple¬ 
tion only 

m 

1 3 

0{r 2 (n log 71 ) 2 ) 

0(n " 2 " polylog(n)) 

Kruskal rank, Exact tensor 
nuclear norm minimiza¬ 
tion, computationally in¬ 
tractable, completion only. 

Our Method 

0(nr ) (random projection) 
0(nr log 2 n) (completion) 

O(Knr) (random projection) 
0(Knr log 2 n) (completion) 

Kruskal rank, separable 
measurements, Leurgans’ 
algorithm 


Table 1: Table comparing sample complexities of various approaches. 


The rest of the paper is organized as follows: in Section [2] we introduce the problem setup and 
describe the approach and result in the most general setting. We also describe Leurgans’ algorithm, 
an efficient linear algebraic algorithm for tensor decomposition, which our results build upon. In 
Section [3] we specialize our results for both the random projections and the tensor completion 
cases. We extend these results and our algorithm to higher order tensors in Section [I] We perform 
experiments that validate our theoretical results in Section [5] In Section [6] we conclude the paper 
and outline future directions. 
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2 Approach and Basic Results 

In this paper, vectors are denoted using lower case characters (e.g. x,y,a,b , etc.), matrices by 
upper-case characters (e.g. X,Y, etc,) and tensors by upper-case bold characters (e.g. X,T.A 
etc.). Given two third order tensors A. B, their inner product is defined as: 

(-A B) — ^ ) AijkBijk. 

i,j,k 

The Euclidean norm of a tensor A is generated by this inner product, and is a straightforward 
extension of the matrix Frobenius norm: 


A\\ 2 F -.= {A, A). 


We will work with tensors of third order (representationally to be thought of as three-way 
arrays), and the term mode refers to one of the axes of the tensor. A slice of a tensor refers to a two 
dimensional matrix generated from the tensor by varying indices along two modes while keeping 
the third mode fixed. For a tensor X we will refer to the indices of the i th mode-1 slice (i.e., the 
slice corresponding to the indices {*} x [: n 2] x [71,3]) by S- 1 , where [712] = { 1 , 2 , ... ,712} and [713] is 
defined similarly. We denote the matrix corresponding to 1 ■ by Xj. Similarly the indices of the 
k th mode-3 slice will be denoted by and the matrix by Xj*. 

Given a tensor of interest X, consider its decomposition into rank one tensors 

r 

X = (1) 

i= 1 


where {ui} i=1 r C M ni , {vi} i=l r C M n2 , and {w;j}j =1 r C M n3 . Here <g> denotes the tensor 
product, so that X E K niXn2Xn3 is a tensor of order 3 and dimension n\ x^x 77 . 3 . Without loss 
of generality, throughout this paper we assume that n\ < 77-2 < 773 . We will first present our results 
for third order tensors, and analogous results for higher orders follow in a transparent manner. We 
will be dealing with low-rank tensors, i.e. those tensors with r < n\. Tensors can have rank larger 
than the dimension, indeed r > 77.3 is an interesting regime, but far more challenging and will not 
be dealt with here. 

Kruskal’s Theorem (29j guarantees that tensors satisfying Assumption 2.1 below have a unique 


minimal decomposition into rank one terms of the form 0- The minimal number of terms is called 
the (Kruskal) ranl0 of the tensor X. 

Assumption 2 . 1 . The sets {ui } i=1 r C M ni and {uj} i=1 r C M n2 are sets of linearly independent 
vectors and the set {wi } i=1 r C BP* 3 is a set of pairwise independent vectors 

While rank decomposition of tensors in the worst case is known to be computationally in¬ 
tractable 


, it is known that the (mild) assumption stated in Assumption 2.1 above suffices for 
an algorithm known as Leurgans’ algorithm mm to correctly identify the factors in this unique 
decomposition. In this paper, we will work with the following, somewhat stronger assumption: 


Assumption 2 . 2 . The sets { Ui } i=1 r C M ni , {vi } i=1 r C IP 12 , and {wi} i=l r C M™ 3 are sets 
of linearly independent vectors. 

s The Kruskal rank is also known as the CP rank in the literature. 
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2.1 Separable Measurement Mechanisms 

As indicated above in the preceding discussions, we are interested in tensor linear inverse problems 
where, given measurements of the form y j = Ci(X) i = 1,2, ,m, we recover the unknown 

tensor X. We focus on a class of measurement mechanisms £(•) which have a special property 
which we call separability. We define the notion of separable measurements formally: 

Definition 2.1. Consider a linear operator £ : u n i XTl 2xn3 j£ n . We sa y that £ is separable with 
respect to the third mode if there exist w E M™ 3 and a linear operator T : M niXn2 —y such that 
for every X E M niXn2Xn3 : 

ri3 

£(X) = X>T(Af). 

1=1 

This definition extends in a natural way for separability of operators with respect to the second 
and first modes. In words, separability means that the effect of the linear operator £ (•) on a tensor 
can be decomposed into the (weighted) sum of actions of a single linear operator T (•) acting on 
slices of the tensor along a particular mode. 

In several applications involving inverse problems, the design of appropriate measurement mech¬ 
anisms is itself of interest. Indeed sensing methods that lend themselves to recovery from a small 
number of samples via efficient computational techniques has been intensely studied in the signal 
processing, compressed sensing, and machine learning literature m mu sa m ns m- In the 
context of tensors, we argue, separability of the measurement operator is a desirable property for 
precisely these reasons; because it lends itself to recovery up to almost optimal sample complexity 
via scalable computational methods (See for example (30J for rank one measurement operators in 
the matrix case). We now describe a few interesting measurement mechanisms that are separable. 


1. Separable random projections: Given a matrix M E M niXn2 and vector v E M" 3 , we 
define the following two notions of “outer products” of M and v: 


[ M ® v \ijk '■= M ij v k [v <8 M\ ijk := Vi M jk . 

Hence, the k th mode 3 slice of the tensor M <S> v is the matrix Similarly, the i th mode 

1 slice of the tensor v ® M is the matrix wM. 

A typical random separable projection is of the form: 


C{X) 


(Ai ® a, X) 
(A m <8> a, X) 


( 2 ) 


where Ai E M niXTl2 is a random matrix drawn from a suitable ensemble such as the Gaussian 
ensemble with each entry drawn independently and identically from A/"(0,1), and a E M n3 is 
also a random vector, for instance distributed uniformly on the unit sphere in n% dimensions 
(i.e. with each entry drawn independently and identically from AA(0,1) and then suitably 
normalized). 

To see that such measurements are separable, note that: 

n-3 

(Ai <8 u, X) = ak(Ai , A|), 

A:=l 
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so that the operator T (•) from Definition 2.1 in this case is simply given by: 


T(X) 


(Ai,X) 
(A m , X) 


Random projections are of basic interest in signal processing, and have played a key role in 
the development of sparse recovery and low rank matrix recovery literature [391 HO]- From an 
application perspective they are relevant because they provide a method of compressive and 
lossless coding of “simple signals” such as sparse vectors m and low rank matrices [39] . In 
subsequent sections we will establish that separable random projections share this desirable 
feature for low-rank tensors. 


2. Tensor completion: In tensor completion, a subset of the entries of the tensor X are 
revealed. Specifically, given a tensor X, a subset of the entries Xijk for i,j,k 6 9 are 
revealed for some index set 9 C [m] x [77.2] x [n^\ (we denote this by (X)^). Whether or not 
the measurements are separable depends upon the nature of the set 9. For the i th mode-1 
slice let us define 



9 ns, 


(i) 


a)._ 


m- : = 


9nS 


(i) 


Measurements derived from entries within a single slice of the tensor are separable. This 
follows from the fact that for C(X) := (X) ( 1 ), we have: 


n 1 

£ (X) = A< n o, (xf) 

3 = 1 


where §i 6 M ni is a vector with a one is the i index and zero otherwise, and A4q is the operator 
that acts on a matrix X, extracts the indices corresponding to the index 9, and returns the 

As a trivial 


resulting vector. Comparing to Definition 


2.1 


we have w = 5i and T = 


extension, measurements obtained from parallel slices where the index set restricted to these 
slices is identical are also separable. 


Analogous to matrix completion, tensor completion is an important problem due to its appli¬ 
cations to machine learning; the problems of multi-task learning and contextual recommen¬ 
dation are both naturally modeled in this framework as described in Section [lj 


3. Rank one projections Another separable sensing mechanism of interest is via rank-one 
projections of the tensor of interest. Specifically, measurements of the form: 


C{X) 


(a\ <g) b\ <S> c, X) 
(&m ^ b m ® C, -X") 


are also separable. Mechanisms of this form have recently gained interest in the context 
of low rank (indeed rank-one) matrices due to their appearance in the context of phase 
retrieval problems [[13] and statistical estimation [30119]. We anticipate that studying rank 
one projections in the context of tensors will give rise to interesting applications in a similar 
spirit. 










4. Separable sketching The notion of covariance sketching (and more generally, matrix sketch¬ 
ing) [19] allows for the possibility of compressively acquiring a matrix X via measurements 
Y = AXB t , where A G M miXp and B G M m2Xp , X G M p , and mi,m 2 < p ■ The problem of 
recovering X from such measurements is of interest in various settings such as when one is 
interested in recovering a covariance matrix from compressed sample paths, and graph com¬ 
pression m • In a similar spirit, we introduce the notion of separable sketching of tensors 
defined via: 

ni n 2 n 3 

wm = £EE Aqi B s j Cfc Xijk. 

i =1 j=1 k =1 

In the above A G R miXni , B G M m 2x n 2 ; c g R n 3 , anc j y G M miXm2 . Note that T (Z) = AZB T , 
i.e. precisely a matrix sketch of tensor slices. The problem of recovering X from Y is thus a 
natural extension of matrix sketching to tensors. 

Finally, we note that while a variety of separable sensing mechanisms are proposed above, 
many sensing mechanisms of interest are not separable. For instance, a measurement of the 
form £(X) = (A,X) where A is a full rank tensor is not separable. Similarly, completion 
problems where entries of the tensor are revealed randomly and uniformly throughout the 
tensor (as apposed to from a single slice) are also not separable (although they may be thought 
of as a union of separable measurements). In Section [3j we will provide sample complexity 
bounds for exact recovery for the first two aforementioned measurement mechanisms (i.e. 
random projections and tensor completion); the arguments extend in a natural manner to 
other separable sensing mechanisms. 


2.1.1 Diversity in the Measurement Set 

In order to recover the low rank tensor from a few measurements using our algorithm, we need the 
set of measurements to be a union of separable measurements which satisfy the following: 


1. Diversity across modes: Measurements of the form ([8]) are separable with respect to the 
third mode. For the third order case, we also need an additional set of measurements separable 
with respect to the first rnode^j This extends naturally also to the higher order case. 


2. Diversity across separable weights: 
Definition: 


Recalling the notion of weight vectors, w, from 


2.1, we require that for both modes 1 and 3, each mode has two distinct sets of 


separable measurements with distinct weight vectors. 


To make the second point more precise later, we introduce the formal notation we will use in 
the rest of the paper for the measurement operators: 


(*) 

Vk 



In the above, the index i G {1,3} refers to the mode with respect to which that measurement is 
separable. For each mode, we have two distinct sets of measurements corresponding to two different 
weight vectors w^\ with k G {1,2}. For each k and i, we may have potentially different operators 

6 Any two modes suffice. In this paper we will focus on separability w.r.t the first and third modes. 
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(though they need not be different). To simplify notation, we will subsequently assume that 
T\' 1 ' 1 = T^' 1 = T®. Collectively, all these measurements will be denoted by: 

y = c{J. f), 

where it is understood that y is a concatenation of the vectors and similarly C (•) is a concate¬ 
nation of (•). We will see in the subsequent sections that when we have diverse measurements 
across different modes and different weight vectors, and when the T 1 ' 1 ' 1 are chosen suitably, one can 
efficiently recover an unknown tensor from an (almost) optimal number of measurements of the 
form y = C (X). 

2.2 Tensor Contractions 

A basic ingredient in our approach is the notion of a tensor contraction. This notion will allow us to 
form a bridge between inverse problems involving tensors and inverse problems involving matrices, 
thereby allowing us to use matrix-based techniques to solve tensor inverse problems. 

For a tensor X , we define its mode-3 contraction with respect to a contraction vector a G M ns , 
denoted by X% G R ,llXn2 , as the following matrix: 

n 3 

[*a] ij = (3) 

fc =1 

so that the resulting matrix is a weighted sum of the mode-3 slices of the tensor X. We similarly 
define the mode-1 contraction with respect to a vector c G M ni as 

ni 

k =1 

Note that when a = e&, a standard unit vector, Xf, = X?, i.e. a tensor slice. We will primarily be 
interested in two notions of contraction in this paper: 

• Random Contractions, where a is a random vector distributed uniformly on the unit sphere. 
These will play a role in our approach for recovery from random projections. 

• Coordinate Contractions, where a is a canonical basis vector, so that the resulting contractions 
is a tensor slice. These will play a role in our tensor completion approach. 

We now state a basic result concerning tensor contractions. 

Lemma 2.1. Let X G K niXn2Xn3 , with ri\ < n 2 < n .3 be a tensor of rank r < n\. Then the rank 
of Xl is at most r. Similarly, if r < min { 712 , 77 - 3 } then the rank of X\ is at most r. 

Proof. Consider a tensor X = X!=i The reader may verify in a straightforward manner 

that Xq enjoys the decomposition: 

r 

Xl = ^{wi,a)uivJ. (5) 

i =1 

The proof for the rank of X} is analogous. □ 
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Note that while © is a matrix decomposition of the contraction, it is not a singular value 
decomposition (the components need not be orthogonal, for instance). Indeed it does not seem 
“canonical” in any sense. Hence, given contractions, resolving the components is a non-trivial task. 

A particular form of degeneracy we will need to avoid is situations where (wi, a) = 0 for ©• It 
is interesting to examine this in the context of coordinate contractions, i.e. when a = e*,, we have 
= X 3 (i.e. the k th mode 3 slice), by Lemma 


2.1 


we see that the tensor slices are also of rank 


at most r. Applying a = e*, in the decomposition ([ 5 ]), we see that if for some vector Wi G M ns in the 
above decomposition we have that the k th component of Wi (i.e. ( wf ) k ) is zero then (wi,e k ) = 0, 
and hence this component is missing in the decomposition of X'f,. As a consequence the rank of X? 
drops, and in a sense information about the factors u k ,v k is “lost” from the contraction. We will 
want to avoid such situations and thus introduce the following definition: 


Definition 2.2. Let X = YH=i Wi. We say that the contraction Xj) is non-degenerate if 

(Wi , a) 7 ^ 0 , for all i = 1 ,..., r. 


We will extend the terminology and say that the tensor X is non-degenerate at mode 3 and 
component k if the k th tensor slice is non-degenerate, i.e. component k of the vectors Wi, i = 1,... ,r 
are all non-zero. The above definition extends in a natural way to other modes and components. 
The non-degeneracy condition is trivially satisfied (almost surely) when: 


1. The vector a with respect to which the contraction is computed is suitably random, for 
instance random normal. In such situations, non-degeneracy holds almost surely. 


2. When a = e k (i.e. the contraction is a slice), and the tensor factors are chosen from suit¬ 
able random ensembles, e.g. when the low rank tensors are picked such that the rank one 
components Ui,Vi,u>i are Gaussian random vectors, or random orthogonal vectors [[] 

We will also need the following definition concerning the genericity of a pair of contractions: 

Definition 2.3. Given a tensor X = YH=i u i ® v i ® w i> a P a ^ r °f contractions Xj), X'f are 
pairwise generic if the diagonal entries of the (diagonal) D a D ff 1 are all distinct, where D a = 
diag ({wi, a), ..., (w r ,a)), D b = diag ((wi, a),..., ( w r ,b )). 


Remark We list two cases where pairwise genericity conditions hold in this paper. 

1. In the context of random contractions, for instance when the contraction vectors a,b are 
sampled uniformly and independently on the unit sphere. In this case pairwise genericity 
holds almost surely. 

2. In the context of tensor completion where a = e/ Cl , b = ek 2 , the two diagonal matrices D a = 
diag ((w\) k] ,..., (w r ) ki ), and D b = diag ((wi ) k2 ,..., (w r ) k2 ). Thus the pairwise genericity 
condition is a genericity requirement of the tensor factors themselves, namely that the ratios 

( w i)k 

7 —^ a ll b e distinct for i = 1 ,...,r. We will abuse terminology, and call such a tensor 

\ w i)k 2 

pairwise generic with respect to mode 3 slices k±, &2 • This form of genericity is easily seen to 
hold, for instance when the tensor factors are drawn from suitable random ensembles such as 
random normal and random uniformly distributed on the unit sphere. 

7 the latter is known as random orthogonal model in the matrix completion literature [3Sj 
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The next lemma, a variation of which appears in mm shows that when the underlying tensor is 
non-degenerate, it is possible to decompose a tensor from pairwise generic contractions. 


2.1 


Lemma 2.2. |7| {31] Suppose we are given an 
n\ x n -2 x 713 satisfying the conditions of Assumption 
non-degenerate, and consider the matrices Mi and M 2 formed as: 

Mi = Xl{Xl) t M 2 = (X^Xl 


order 3 tensor X = YH=i u i ® Vi < 8 > Wi of size 
Suppose the contractions X 3 and X 3 are 


Then the eigenvectors of Mi (corresponding to the non-zero eigenvalues) are {ui\ i=l r , and the 

eigenvectors of M 2 are { v i}i=i r - 

Proof. Suppose we are given an order 3 tensor X = )G( =1 u t <g> Vi <8> Wi £ K niXn2Xn3 . From the 
definition of contraction ([ 3 ]), it is straightforward to see that 

Xl = UD a V T D a = diag(a T u>i,..., a T w r ) 

X 3 = UDi,V t Df, = di&g(b T wi ,..., h T w r ). 

In the above decompositions, U £ M niXr , V £ W l2Xr , and the matrices D a , £ M rxr are diagonal 
and non-singular (since the contractions are non-degenerate). Now, 

Mi := X 3 (X 3 )t 

= UD a V T (V') T D~ l tf 

= UD a Df W (6) 

and similarly we obtain 

Ml = VD~ l D a V t. (7) 

Since we have MiU = UD a Df l and V = VD(f l D a , it follows that the columns of U and V 
are eigenvectors of Mi and Ml respectively (with corresponding eigenvalues given by the diagonal 
matrices D a Df l and Dff l D a ). □ 


Remark Note that while the eigenvectors {ui},{vj} are thus determined, a source of ambiguity 
remains. For a fixed ordering of the Ui one needs to determine the order in which the Vj are to be 
arranged. This can be (generically) achieved by using the (common) eigenvalues of Mi and M 2 for 
pairing. If the contractions X 3 ,X 3 satisfy pairwise genericity, we see that the diagonal entries of 
the matrix DaDf 1 are distinct. It then follows that the eigenvalues of Mi, M 2 are distinct, and 
can be used to pair the columns of U and V. 


2.3 Leurgans’ algorithm 

We now describe Leurgans’ algorithm for tensor decomposition in Algorithm [l] In the next section, 
we build on this algorithm to solve tensor inverse problems to obtain optimal sample complexity 
bounds. In words, Algorithm [l] essentially turns a problem involving decomposition of tensors into 
that of decomposition of matrices. This is achieved by first computing mode 3 contractions of the 
given tensor X with respect to two non-degenerate and pairwise generic vectors a, b (e.g. randomly 
uniformly distributed on the unit sphere). Given these contractions, one can compute matrices Mi 
and M 2 as described in Lemma 2.2 whose eigenvectors turn out to be precisely (up to scaling) the 
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vectors Ui and V{ of the required decomposition. Finally the W{ can be obtained by inverting an 
(overdetermined) system of linear equations, giving a unique and exact solution. 


The correctness of the algorithm follows directly from Lemma 2.2 


In this paper, we extend this idea to solving ill-posed linear inverse problems of tensors. The 
key idea is that since the contractions preserve information about the tensor factors, we focus on 
recovering the contractions first. Once those are recovered, we simply need to compute eigende- 
compositions to recover the factors themselves. 


Algorithm 1 Leurgans’ algorithm for tensor decomposition 

1: Input: Tensor X 

2: Generate contraction vectors a, b E M™ 3 (such that non-degeneracy and pairwise genericity 
holds). 

3 : Compute mode 3 contractions and Xjf respectively. 

4 : Compute eigen-decomposition of M\ := X^(Xjj)^ and M2 := {XffiX a . Let U and V denote 
the matrices whose columns are the eigenvectors of M\ and M J respectively corresponding to 
the non-zero eigenvalues, in sorted order. (Let r be the (common) rank of M\ and M2.) The 
eigenvectors, thus arranged are denoted as {ui} i=1 r and {vi} i=1 r . 

5 : Solve for w l in the (over-determined) linear system X = YH=i u i ® Vi < 8 > Wi, i = 1 , ..., m. 

6: Output: Decomposition X = YH=i Ui®Vi® Wi. 


Remark Note that in the last step, instead of solving a linear system of equations to obtain the 
Wi, there is an alternative approach whereby one may compute mode 1 contractions and then 
obtain the factors Vi and u^. However, there is one minor caveat. Suppose we denote the factors 
obtained from the modal contractions X% and Xjf by U and V\ (we assume that these factors 
are normalized, i.e. the columns have unit Euclidean norm). Now, we can repeat the procedure 
with two more random vectors c, d to compute the contractions X\ and X\. We can perform 
similar manipulations to construct matrices whose eigenvectors are the tensor factors of interest, 
and thence obtain (normalized) factors V2 and W. While V\ and V2 essentially correspond to the 
same factors, the matrices themselves may (i) have their columns in different order, and (ii) have 
signs reversed relative to each other. Hence, while the modal contractions preserve information 
about the tensor factors, they may need to be properly aligned by rearranging the columns and 
performing sign reversals, if necessary. 


2.4 High Level Approach 

The key observation driving the methodology concerns the separability of the measurements. Given 
a set of separable measurements y = jC ( X ), from the definition of separability we have: 


n 3 


n 3 


y = £(X) = '£w i T(xi)=Tl'£w i X?\ =T{XI). 

i= 1 \i=l / 

In words, each separable measurement C acting on the tensor can also be interpreted as a mea¬ 
surement T acting on a contraction of the tensor. Since these contractions are low rank (Lemma 


2.1), when the underlying tensor is low-rank, the following nuclear norm minimization problem 


represents a principled, tractable heuristic for recovering the contraction: 


minimize^ || Z\\ 


subject to y = T ( Z ) 
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Let us informally define T to be “faithful” if nuclear norm minimization succeeds in exactly 
recovering the tensor contractions. Provided we correctly recover two contractions each along modes 
1 and 3, and furthermore these contractions are non-degenerate and pairwise generic, we can apply 
Leurgans’ algorithm to the recovered contractions to exactly recover the unknown tensor. This 
yields the following meta-theorem: 

Meta-Theorem. Given a low rank tensor X and separable measurements 



4° w 



*€{l,3}, fe e {1,2}. 


Suppose the are faithful and for the vectors w^\ the contractions X 1 m are non-degenerate and 

K W k 

pairwise generic. Then the proposed approach succeeds in exactly recovering the unknown tensor. 


In the next section, we will make the above meta-theorem more precise, and detail the precise 
sample complexities for the separable random projections and tensor completion settings. We will 
see that faithfulness, non-degeneracy and pairwise genericity hold naturally in these settings. 


3 Sample Complexity Results: Third Order Case 

3.1 Tensor Recovery via Contractions 

We start by describing the main algorithm of this paper more precisely: Tensor Recovery via 

/q\ 

Contractions (T-ReCs). We assume that we are given separable measurements y\ = C\ ; (X), 
yip = 4' 3) (X), vP = 4^ (X), Vp = 4^ (X). We further assume that the measurements are 
separable as: 


4 3) (X) = ^a ! T (3) {X?) 

i —1 
n\ 

4 1} (x) = x>r« (x/) 

i =1 i —1 

where a, b , c, d and T ® and are known in advance. Given these measurements our algorithm 
will involve the solution of the following convex optimization problems. 


minimize 

Zi 

M* 

S.t. 

( 3 ) 

y\ 

= r {3) (zd 

(9) 

minimize 

z 2 


s.t. 

v? 

= r (3) (z 2 ) 

(10) 

minimize 

^3 

M* 

s.t. 

y? 

= T (1) (Z 3 ) 

(ii) 

minimize 

Za 

ll^4|U 

s.t. 

4 1} 

= T (1) (Z 4 ) 

(12) 


Efficient computational methods have been extensively studied in recent years for solving prob¬ 
lems of this type [23|. These matrices form the “input matrices” in the next step which is an 


n-3 


4 3) (x) = j2^ (x?) 
2=1 
ni 

4 1} (. X) = Y j d i T « (.X }). 


( 8 ) 
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adaptation of Leurgans’ method. In this step we form eigendecompositions to reconstruct first the 
pair of factors Ui,Vi, and then the pairs Vi,Wi (the factors are normalized). Once these are recovered 
the last step involves solving a linear system of equations for the weights Aj in 

r 

£i' 3) (A) = ^2 Ai£i 3 \ui ®Viig> Wi) = y[ 3) 

i =1 

The pseudocode for T-ReCs is detailed in Algorithm [2] 


Algorithm 2 Tensor-Recovery via Contractions (T-ReCs) 


1: 


2 : 


4 : 


5 : 


6 : 


Input: Separable measurements y j 3 '* = C ^ (A 

(x). 


,( 3 ) 


( 3 ) 

2/2 


= C 


( 3 ) 


X) 


y? 


= c\ ]) (A 


V2 ] = 


Solve convex optimization problems Q and (10) to obtain optimal solutions Zf and respec¬ 
tively. 

3 : Compute eigen-decomposition of M± := and M -2 := (ZfflZ\. Let U and V denote 

the matrices whose columns are the eigenvectors of Mi and respectively corresponding to 
the non-zero eigenvalues, in sorted order. (Let r be the (common) rank of M\ and M 2 .) The 
eigenvectors, thus arranged are denoted as {ui} i=1 r and {vi} i=1 r . 

Solve convex optimization problems © and ( |12[ ) to obtain optimal solutions Z% and Z\ re¬ 
spectively. 

Compute eigen-decomposition of M 3 := Zg(Z|)^ and M 4 := {Z\)^Z 3 . Let V and W denote 
the matrices whose columns are the eigenvectors of M 3 and Mj respectively corresponding to 
the non-zero eigenvalues, in sorted order. (Let r be the (common) rank of M 3 and M 4 .) The 
eigenvectors, thus arranged are denoted as {hfc } fc=1 r and {uik} k=1 r . 

Simultaneously reorder the columns of V, W, also performing simultaneous sign reversals as 
necessary so that the columns of V and V are equal, call the resulting matrix W with columns 

Wi=l„..,7" 

7 : Solve for A* in the (over-determined) linear system 


Vi = ^ ( Ui 

i= 1 


1 Vi 


1 Wi 


8 : Output: Recovered tensor A = ]Ci=i A i Ui <S) v t <S) vj t . 


We now focus on the case of recovery from random Gaussian measurements, and then move 
on to the case of recovery from partially observed samples - in these situations not only are the 
measurements separable but one can also obtain provable sample complexity bounds which are 
almost optimal. 
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3.2 Separable Random Projections 


Recall that from the discussion in 
the following set of measurements: 


Section [2] and the notation introduced in Section 2.1.1 


we have 



(A\ 0 ft, X) 


{A 1 ^b,X) 

1-1 “w 

5 

11 

{A mi 0 ft, -X") 

hi 

s 

II 

{Ami ® X) 


(c0 B\, X) 


(d g B\, X) 

4 1} m = 

{c 0 -6777,2 5 X) 

II 

g 

'—' CN 

'A 

{d g B m 2 , X } 


In the above, each Ai,B{ E M n2Xns is a random Gaussian matrix with i.i.cl jV(0,1) entries, 
and a,b E M n3 , c,d 6 M ni are random vectors distributed uniformly on the unit sphere. Finally, 
collecting all of the above measurements into a single operator, we have y = C (X), and the total 
number of samples is thus m = 2mi + 2 m 2 - 

In the context of random tensor sensing, (§, @, @ and ( [T2| ) reduce to solving low rank 
matrix recovery problems from random Gaussian measurements, where the measurements are as 
detailed in Section O 

The following lemma shows that the observations L (X) can essentially be thought of as linear 
Gaussian measurements of the contractions Xf, Xf r X c ', X\. This is crucial in reducing the tensor 
recovery problem to the problem of recovering the tensor contractions, instead. 


Lemma 3.1. For tensor X, matrix A and vector a of commensurate dimensions, 


<■ A®a,X) = (A,Xl ). 


Similarly, for a vector c and matrix B of commensurate dimensions 

{c®B,X) = (B,X 1 c ). 


Proof. We only verify the first equality, the second equality is proved in an identical manner. Let 
us denote by X k the k th mode 3 slice of X where k = 1,..., n^. Then we have, 

n 3 n 3 

(A®a,X) =Y j a k {A,X k ) = = (A,X a 3 >. 

k=1 k =1 

□ 


As a consequence of the above lemma, it is easy to see that 

n 3 n 3 

(A ® o, X) = {A, Xl) = {A, Y, arXf) = £ a*(A, Xf), 

2=1 2=1 

thus establishing separability. 

Since X 3 and X 3 are low-rank matrices, the observation operators C^' (X) essentially provide 
Gaussian random projections of X 3 and X 3 , which in turn can be recovered using matrix-based 
techniques. The following lemma establishes “faithfulness” in the context of separable random 
projections. 
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Lemma 3.2. Suppose m± > 3?’(?ri + 712 — r). Then the unique solutions to problems ([9]) and (10) 
are and respectively with high probability. Similarly, if m 2 > 2r{n2 + n^ — r) then the unique 
solutions to problems © and (|12[) are X\ and X^ respectively with high probability. 


Proof. Again, we only prove the first part of the claim, the second follows in an identical manner. 
Note that by Lemma 3.1 and Lemma 2.1 X% and Xf are feasible rank r solutions to © and 
(10) respectively. By Proposition 3.11 of [15], we have that the nuclear norm heuristic succeeds in 
recovering rank r matrices from mi > 3r(ni + ri 2 — r) with high probability. □ 


Remark In this sub-section, we will refer to events which occur with probability exceeding 1 — 
exp(—Coni) as events that occur “with high probability” (w.h.p.). We will transparently be able 
to take appropriate union bounds of high probability events since the number of events being 
considered is small enough that the union event also holds w.h.p. (thus affecting only the constants 
involved). Hence, in the subsequent results, we will not need to refer to the precise probabilities. 

Since the contractions X% and X^ of the tensor X are successfully recovered and the tensor 
satisfies Assumption ! 2.1] the second stage of Leurgans’ algorithm can be used to recover the factors 
Ui and Vi. Similarly, from X\ and X\, the factors Vi and Wi can be recovered. The above sequence 
of observations leads to the following sample complexity bound for low rank tensor recovery from 
random measurements: 


Theorem 3.3. Let X g K niXn2Xn3 be an unknown tensor of interest with rank r < min {ni, 712,77.3}. 
Suppose we obtain samples as described by ([8]). Suppose m\ > 3 r(ni+n 2 —r) and m 2 > 3r(n2+ns — 
r). Then T-ReCs (Algorithm [s|) succeeds in exactly recovering X and its low rank decomposition 
([5]) with high probability. 


Proof. By Lemma O A;), Xf r X l c , X] are all rank at most r. By Lemma 


tions y^i\ provide linear Gaussian measurements of X ? f, Xjf, X*, X\. By Lemma 

the convex problems ®, ©, ©, @ correctly recover the modal contractions Xf , Xjf , Xf , Xj. 
Since the vectors a, b , c, d are chosen to be randomly uniformly distributed on the unit sphere, the 
contractions X%,X? are non-degenerate and pairwise generic almost surely (and similarly X )., X \). 

applies and Xf t , Xjf can be used to correctly recover the factors tq, Vi, i = 1,..., r. 
Xl , Xj can be used to correctly recover the factors u*, Wi, i = 1,..., r. Note 


(3) (1) ^1) 

2/2 iVi \ 


3.1 


the tensor observa- 


3.5 


Thus, Lemma 


2.2 


2.2 


Again by Lemma 

that due to the linear independence of the factors, the linear system of equations involving Aj is 
full column rank, over-determined, and has an exact solution. The fact that the result holds with 
high probability follows because one simply needs to take the union bounds of the probabilities of 
failure exact recovery of the contractions via the solution of ®, ©, ©, ©• □ 


Remarks. 


1. Theorem 3.3 yields bounds that are order optimal. Indeed, consider the number of samples 
m = 2m\ + 2 ?B 2 ~ 0{r{n\ + n 2 + 713 )), which by a counting argument is the same as the 
number of parameters in an order 3 tensor of rank r. 


2. For symmetric tensors with symmetric factorizations of the form X = Yl"i=] this 

method becomes particularly simple. Steps 4, 5,6 in Algorithm [S] become unnecessary, and the 
factors are revealed directly in step 3. One then only needs to solve the linear system described 
in step 7 to recover the scale factors. The sample complexity remains 0(nr), nevertheless. 
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3. Note that for the method we propose, the most computationally expensive step is that of solving 
low-rank matrix recovery problems where the matrix is of size m x rij for i,j = 1,2,3. Fast 
algorithms with rigorous guarantees exist for solving such problems, and we can use any of 
these pre-existing methods. An important point to note is that, other methods for minimizing 
the Tucker rank of a tensor by considering “matricized” tensors solve matrix recovery problems 
for matrices of size Hi X rijn^, which can be far more expensive. 

4- Note that the sensing operators (Ai <S> a, ■) may seem non-standard (vis-a-vis the compressed 
sensing literature such as W), but are very storage efficient. Indeed, one needs to only store 
random matrices Ai,Bi and random vectors a,b. Storing each of these operators requires 
0 ( n i n 2 + ^ 3 ) space, and is far more storage efficient than (perhaps the more suggestive) 
sensing operators of the form (Aj, ■), with each Aj being a random tensor requiring Ofni^nz) 
space. Similar “low rank'’ sensing operators have been used for matrix recovery (30i 

5. While the results here are presented in the case where the Ai, Bi are random Gaussian matrices 
and the a, b are uniformly distributed on the sphere, the results are not truly dependent on 
these distributions. The A*, Bi need to be structured so that they enable low-rank matrix 
recovery (i.e., they need to be “faithful”). Hence, for instance it would suffice if the entries 
of these matrices were sub-Gaussian, or had appropriate restricted isometry properties with 
respect to low rank matrices 139 j /. 


3.3 Tensor Completion 

In the context of tensor completion, for a fixed (but unknown) X, a subset of the entries Xq are 
revealed for some index set fl C [m] x [712] x [n 3]. We assumed that the measurements thus revealed 
are in a union of four slices. For the i th mode-1 slice let us define 



n n s, (1) 


(i) 

rri) := 


n n s 4 


These are precisely the set of entries revealed in the i th mode-1 slice and the corresponding cardi¬ 
nality. Similarly for the k th mode-3 slice we define 



fin s. 


(3) 



fin s, 


(3) 


(13) 


We will require the existence of two distinct mode-1 slices (say i\ and zJj) from which measurements 
are obtained. Indeed, 

4 1} (X) -.= (X) qW 4 1} (X) := (X) n ( 1} . (14) 

X 1 x 2 

Similarly we will also require the existence of two different slices in mode 3^] (say k\ and kf) from 
which we have measurements: 

4 3) (X) := (X) n(3) 4 3) W := (*) n @) ■ 

We will require the cardinalities of the measurements from mode 1, mQ and rn^J and from mode 
3, and rn k J to be sufficiently large so that they are faithful (to be made precise subsequently), 

8 We choose modes 1 and 3 arbitrarily. Any two of the 3 modes suffice. 
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and this will determine the sample complexity. The key aspect of the algorithm is that it only 
makes use of the samples in these four distinct slices. No other samples outside these four slices 
need be revealed at all (so that all the other m and ml 3 ^ can be zero). The indices sampled from 
each slice are drawn uniformly and randomly without replacement. Note that for a specified m^l\ 


m 


T) ™(3) 


ml* and ml? the overall sample complexity implied is + mffj + m'f 


,(!) 




,( 3 ) 


( 3 ) 


fc 2 "-1 "-2 

In the context of tensor completion, (|9| 


(10), (11) and (12) reduce to solving low rank ma¬ 


trix completion problems for the slices S}f J , S']?, SPJ , Si. Contraction recovery in this context 

l 2 K 1 ^2 

amounts to obtaining complete slices, which can then be used as inputs to Leurgans’ algorithm. 
There are a few important differences however, when compared to the case of recovery from Gaus¬ 
sian random projections. For the matrix completion sub-steps to succeed, we need the following 
standard incoherence assumptions from the matrix completion literature j38j . 

Let U , V and W represent the linear spans of the vectors {v n } =1 r , {uj} =1 r , {u>i} =1 r ■ Let 
Pu, Py and Pyv respectively represent the projection operators corresponding to U, V and W. The 
coherence of the subspace U (similarly for V and W) is dehned as: 


lilfA) := — max || P u {&%) || 2 , 
r t=l,...,ni 


where {e^} are the canonical basis vectors. 

Assumption 3.1 (Incoherence), po := max{//(ZL),/r(V),/u(W)} is a positive constant independent 
of the rank and the dimensions of the tensor. 


Such an incoherence condition is required in order to be able to complete the matrix slices 
from the observed data [38]. We will see subsequently that when the tensor is of rank r, so are 
the different slices of the tensor and each slice will have a “thin” singular value decomposition. 
Furthermore, the incoherence assumption will also hold for these slices. 


Definition 3.1. Let X\ = UXV T be the singular value decomposition of the tensor slice X\. We 
say that the tensor X satisfies the slice condition for slice with constant fj,^ if the element-wise 
infinity (max) norm 


\UV ||oo < A 


(i) 


n 2 n 3 


( 2 ) (3) 

The slice condition is analogously defined for the slices along other modes, i.e. S- and S^. . 
( 2 ) (3) 

We will denote by /j.) and fi y k ; the corresponding slice constants. We will require our distinct slices 
from which samples are obtained to satisfy these slice conditions. 


Remark The slice conditions are standard in the matrix completion literature, see for instance 
|38j. As pointed out in [38], the slice conditions are not much more restrictive than the incoherence 
condition, because if the incoherence condition is satisfied with constant then (by a simple 
application of the Cauchy-Schwartz inequality) the slice condition for S'] 1 is also satisfied with 
constant ni(i) < (jlq*J r for all i (and similarly for p - ; and )■ Hence, the slice conditions can 
be done away with, and using this weaker bound only increases the sample complexity bound for 
exact reconstruction by a multiplicative factor of r. 
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Remark Note that the incoherence assumption and the slice condition are known to be satisfied 
for suitable random ensembles of models, such as the random orthogonal model, and models where 
the singular vectors are bounded element-wise 


The decomposition ([5]) ties factor information about the tensor to factor information of con¬ 
tractions. A direct corollary of Lemma 2.1 is that contraction matrices are incoherent whenever 
the tensor is incoherent: 

Corollary 3.4. If the tensor satisfies the incoherence assumption, then so do the contractions. 
Specifically all the tensor slices satisfy incoherence. 


2.1 


the row and column- 


Proof. Consider for instance the slices Xy. for k = 1,..., 77 , 3 . By Lemma 
spaces of each slice are precisely U and V respectively, thus the incoherence assumption also holds 
for the slices. □ 

We now detail our result for the tensor completion problem: 

Lemma 3.5. Given a tensor X with rank r <n\ which satisfies the following: 


Assumptions 2.2 and 3.1 


• The samples are obtained as described in (13), (14). 

• Suppose the number of samples from each slice satisfy: 

rriyl ) > 32 max |/r 0 , j r ( n 2 + n ‘i) lo g 2 n 3 

rriyl - 1 > 32 max |/r 0 , j r ( n 2 + n ‘i) log 2 n 3 

rri^} > 32 max |/r 0 , (/4?) j r ( n i + n i) fog 2 n 2 

rnf?) > 32 max |/r 0 , (h L< k*') j r ( n i + n ^) 1 0 g 2 n 2 

• The slice condition (Definition 3.1) for each of the four slices S^}\ sjP, , S^) hold. 

Then the unique solutions to problems (©, ©, © and ( [l2| ) are X%* , Xy* , and Xj* respec¬ 
tively with probability exceeding 1 — CTog(ri 2 )n 2 d for some constants C, (3 > 0. 

Proof. By Lemma 2.1 X(*, X(,, X'(, are all rank at most r. By Theorem 1.1 of [38] . the 

convex problems & © correctly recover the full slices , Xj* , Xj* with high 

probability. (Note that the relevant incoherence conditions in f5S| are satisfied due to Corollary 2.1 
and the slice condition assumption. Furthermore the number of samples specified meets the sample 
complexity requirements of Theorem 1.1 in [2E] for exact recovery.) □ 


Remark We note that in this sub-section, events that occur with probability exceeding 1 — 
CTog(ri 2 )n^ (recall that n± < ri 2 < n 3 ) are termed as occurring with high probability (w.h.p.). 
We will transparently be able to union bound these events (thus changing only the constants) and 
hence we refrain from mentioning these probabilities explicitly. 
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Theorem 3.6. Let X E M niXn2Xn3 be an unknown tensor of interest with rank r < n \, such that 
the tensor slices X%* are non-degenerate and pairwise generic, and similarly X- 1 *, X(* are non¬ 
degenerate and pairwise generic. Then, under the same set of assumptions made for Lemma 3A. the 
procedure outlined in Algorithm^ succeeds in exactly recovering X and its low rank decomposition 
([!]) with high probability. 


Proof. The proof follows along the same lines as that of Theorem |3.3[ with Lemma 3.5 allowing 
us to exactly recover the slices X (,, Xf ,, . Since these slices satisfy non-degeneracy and 

pairwise genericity, the tensor factors Ui,Vi,Wi , i = 1 ,... ,r can be exactly recovered (up to scaling) 
by following steps (3), (5) and (6) of Algorithm [2| Also, the system of equations to recover A is 
given by 


r 


x n = y ] A i(ui <8> Vi <g> Wi)n- 


2=1 


□ 


Remarks. 


(i) 

3.6 yields bounds that are almost order optimal when po and p( are constant 


1. Theorem 

(independent of r and the dimension). Indeed, the total number of samples required is 
m ~ 0 (rri 3 log 2 77 . 3 ), which by a counting argument is nearly the same number of parame¬ 
ters in an order 3 tensor of rank r (except for the additional logarithmic factor). 

2. The comments about efficiency for symmetric factorizations in the Gaussian random projec¬ 
tions case hold here as well. 


3. We do not necessarily need sampling without replacement from the four slices. Similar results 
can be obtained for other sampling models such as with replacement 133 j/ . and even non- 
uniform sampling m- Furthermore, while the method proposed here for the task of matrix 
completion relies on nuclear norm minimization, a number of other approaches such as alter¬ 
nating minimization (27, can also be adopted; our algorithm relies only on the successful 

completion of the slices. 


4■ Note that we can remove the slice condition altogether since the incoherence assumption im¬ 
plies the slice condition with p\ = po^/r. Removing the slice condition then implies an overall 
sample complexity of 0 (r 2 n 3 log 2 713 ). 


4 Extension to Higher Order Tensors 

The results of Section [3] can be extended to higher order tensors in a straightforward way. While 
the ideas remain essentially the same, the notation is necessarily more cumbersome in this section. 
We omit some technical proofs to avoid repetition of closely analogous arguments from the third 
order case, and focus on illustrating how to extend the methods to the higher order setting. 

Consider a tensor X E M. niX "' xnK of order K and dimension m x ■ • • x uk- Let us assume, 
without loss of generality, that n± < ri2 < ... < uk- Let the rank of this tensor be r < n\ and be 
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given by the decomposition: 


x = Y u i 0 ■ ■ ■ ® u ? = J2 ® uV i > 

1=1 1=1 p=i 

where G M np . We will be interested in slices of the given tensor that are identified by picking 
two consecutive modes (k,k + 1), and by fixing all the indices not in those modes, i.e. i\ G 
[ni],..., ik~i G [rik-i\,ik +2 G [n k + 2], ..., ir G [uk]- Thus the indices of a slice S are: 


S := {4} X ■ ■ ■ X {i k -i} x [n k ] x [n k+1 ] x { 4 + 2 } x ■ • • x {i K } , 


and the corresponding slice may be viewed as a matrix, denoted by X$- While slices of tensors can 
be defined more generally (i.e. the modes need not be consecutive), in this paper we will only need 
to deal with such “contiguous” slices. [^] We will denote the collection of all slices where modes 
(k, k + 1) are contained to be: 

S (fe) := {{ 4 } x • • • x {i k -i} x [n k \ x [n k + 1 ] x { 4 + 2 } x • • • x {i K } \ 4 G [ni ],... ,i K G [n K ]} . 


Every element of is a set of indices, and we can identify a tensor A G u n i x "' XTl fc-i xr u +2 x-xn K 
with a map A : S^ —> M. Using this identification, every element of A can thus also be referenced 
by S' G . To keep our notation succinct, we will thus refer to As as the element corresponding 
to S under this identification. Thus if S = {4} x • • • x {4- 1 } x [nk] x [^fe+i] x { 4 + 2 } x • ■ ■ X {ik}, 
the element: 

As — Ai 1 ^^i k _ lt i k+2 ^_^ K . 

Using this notation, we can define a high-order contraction. A mode-A: contraction of X with 
respect to a tensor A is thus: 

X k A-.= Y, A * X S- ( 15 ) 

Se-S( fe ) 


Note that since X k A is a sum of (two-dimensional) slices, it is a matrix. As in the third order case, 
we will be interested in contractions where A is either random or a coordinate tensor. The analogue 
of Lemma 2.2 for the higher order case is the following: 


Lemma 4.1. Let X have the decomposition X = Yl\=i ®p= 1 ■ Then we have that the contraction 

X k A has the following matrix decomposition: 


•A = E ‘'M 

l=i 



(16) 


where := (A, (££) u p l ). Furthermore, if Xg is another contraction with respect to B, then the 

p^=k,k +1 

eigenvectors of the matrices 

Mi = X k A (X|) f M 2 = ( T X k ^j (17) 

respectively are {uf}i=i,...,r and {u z fc+ 1 }/=i,... jr . 

9 In general, a slice corresponding to any pair of modes (fci,fc 2 ) suffices for our approach. However, to keep the 
notation simple we present the case where slices correspond to mode pairs of the form ( k , k + 1). 
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Proof. It is straightforward to verify by simply expanding the definition of X A using the definition 
of contraction (15): 




J Jk,Jk +1 


E 


jl, — ,jk-l,jk+2, — ,3K 1=1 



bl.-JJs-lJt+21-jK 1 


Rearranging terms, we get the decomposition (16). The eigenvalues of M\, M 2 follow along similar 
lines to the proof of Lemma 2.2 □ 


As a consequence of the above lemma, if X is of low rank, so are all the contractions. The 
notions of non-degeneracy and pairwise genericity of contractions extend in a natural way to the 
higher order case. We say that a contraction X A is non-degenerate if v k 7 ^ 0 for all l = 1 ,r. 
Furthermore, a pair of contractions is pairwise generic if the corresponding ratios u k are all distinct 
for l = 1 ,r. Non-degeneracy and pairwise genericity hold almost surely when the contractions 
are computed with random tensors A , B from appropriate random ensembles (e.g. i.i.d. normally 
distributed entries). In much the same way as the third order case, Leurgans’ algorithm can be used 

This is described in Algorithm [3| 


to perform decomposition of low-rank tensors using Lemma 4.1 


Algorithm 3 Leurgans’ Algorithm for Higher Order Tensors 

1: Input: Tensor X. 

2: for k = 1 to K — 1 do 

3 : Compute contractions X A and Xg for some tensors A and B of appropriate dimensions, 

such that the contractions are non-degenerate and pairwise generic. 

4 : Compute eigen-decompositions of M\ := X A (Xg)^ and M 2 : = (Xg)^ X A . Let U k and 

U k+1 denote the matrices whose columns are the eigenvectors of M\ and Mj respectively 
corresponding to the non-zero eigenvalues, in sorted order. (Let r be the (common) rank of 
Mi and M 2 .) 

5 : If k = 1 , let U 1 := U 1 and U 2 := U 2 . 

6: If k > 2, simultaneously reorder the columns of U k , U k+1 , also performing simultaneous sign 

reversals as necessary so that the columns of U k obtained match with the columns of U k 
(obtained in the previous iteration), call the resulting matrices U k , U k+1 . (The eigenvectors 
corresponding to mode k + 1, thus obtained are denoted as {u k+1 }i = i,...,r-) 

7: end for 

8: Solve for A 1 in the (over-determined) linear system 

x = ±\i®4. 

1=1 k= 1 

9 : Output: Recovered tensor X = Yl\=i &fc=i u i- 


Finally, the notion of separable measurements can be extended to higher order tensors in a 
natural way. 

Definition 4.1. Consider a linear operator C, : K niX "' xnx —> M n . We say that C, is separable 
with respect to the k th mode if there exist W 6 an d a linear operator 
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T{k) . |n t xn H1 such that for every X G gmx-xnjf ■ 

C(X)= Y w s T {k) (x|) . 

Se<S( fc ) 

Analogous to the third order case, we assume that we are presented with two sets of separable 
measurements per mode: 

y f) = 4 fc )(x)= y m) s T {k) (xi) 

SeS<» 

yi k) = 6 k) {X) = Y {W 2 ) s T^[x k s) 

SeS<» 

for k = 1,..., K — 1 with each of yf'\ E M m,c . Once again, by separability we have: 

y[ k) = r (fc) (x^) y {k) = r {k) (x^ 2 ), 

and since the contractions Xyy and Xyy 2 are low rank, nuclear norm minimization can be used to 
recover these contractions via: 

minimize ll-^ill* subject to y[ k ^ = (Z\) , (18) 

minimize subject to y^ = T^(Z 2 ), (19) 

for each k = — 1. After recovering the two contractions for each mode, we can then 

apply (the higher order) Leurgans’ algorithm to recover the tensor factors. The precise algorithm 
is described in Algorithm [ 4 J Provided the (•) are faithful, the tensor contractions can be 
successfully recovered via nuclear norm minimization. Furthermore, if the contractions are non¬ 
degenerate and pairwise generic, the method can successfully recover the entire tensor. 


4.1 Separable Random Projections 

Given tensors A E ]R n i x -" xn rfi ; B G M n ^i+i x ''' xn ^i+x 2 ; C G ~§t nK i+ K 2+ lX '" xnK i+ K 2+ K 3 of orders 
K\ , K 2 and X '3 respectively with K\ + K 2 + K3 = K , we define their outer product as: 


[A 0 B <S> ^~'^ i K 1 +K 2 +i^--^K 1 +K 2 +K 3 

Note also that the inner-product for higher order tensors is defined in the natural way: 

(T,X):= £ [T] (l . lK \X] h . iK . 


In this higher order setting, we also work with specific separable random projection operators, 
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Algorithm 4 T-ReCs for Higher Order Tensors 


4: 


Input: Measurements y\ k ^ = C ( X ) 
for k = 1 to K — 1 do 


for k = 1,..., K, i = 1, 2. 


Solve convex optimization problems (18) and (19) to obtain optimal solutions Z\ and 
respectively. 

Compute eigen-decompositions of M\ := Z^lZ^ and M 2 := (Z^Z\. Let U k and U k+1 de¬ 
note the matrices whose columns are the normalized eigenvectors of M\ and Mj respectively 
corresponding to the non-zero eigenvalues, in sorted order. (Let r be the (common) rank of 
Mi and Mj■) 

5: If k = 1, let U 1 := U 1 and U 2 := U 2 . 

6: If k > 2, simultaneously reorder the columns of U k , U k+1 , also performing simultaneous sign 

reversals as necessary so that the columns of U k obtained match with the columns of U k 
(obtained in the previous iteration), call the resulting matrices U k , U k+1 . (The eigenvectors 
corresponding to mode k + 1, thus obtained are denoted as {u k+1 }i = i,..., r .) 

7: end for 

8: Solve for A; in the (over-determined) linear system 


r / I< 

,(k) _ \ ' \ p(k) I /<3>\ ,,fc 




«/ , k = l,...,K -1, * = 1,2. 


9: Output: Recovered tensor X = J2l=i 'V 0fc=i 4 


l ■ 


which are defined as below: 

y (k) = £ (k) (x) 

y? = (X) 


{A k ®F (k) ®B k ,X) ' 

(A k <g) Tm] ® B k ,X) _ 
{C k ®r[ k) ®D k ,X) ' 

(C k <8> Tm] <g> D k , X) _ 


( 20 ) 


( 21 ) 

In the above expressions, A k ,C k £ and B k ,D k £ JJ n fc+2X"-xn Jf . The tensors A k , B k , 

C k , D k are all chosen so that their entries are randomly and independently distributed according to 
jV(0,1) and subsequently normalized to have unit Euclidean norm. The matrices £ M" iXn *+i 
for i = 1,..., Tn k have entries randomly and independently distributed according to Af(0, 1). For 
each k we have 2 m k measurements so that in total there are 2 m k measurements. 

Lemma 4.2. We have the following identity: 

(A k (8) rf } ® B k , X) = {vf\x k Ak(dBk ). 
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Proof. The proof is analogous to that of Lemma 3.1 
(A k ®r[ k) ®B k ,X) 




■'(&) 


K 


Bi 


1=1 p =1 

r k— 1 K 


= £< A *><SKXSfc, (g) 

i=l p=l p=/c+2 

r ft—1 K 

Z=1 p= 1 p=k +2 


<u) X] ® wf +1 ) (where rf = (A fc ® 0 uf)) 

Z=1 £>7^,fc,ft+l 


_/t(^) 


(r\ k \J2^i u i® u i +1 ) 


i=i 


_ / r (fc) \ 

A i ! ^Ak®B k 1 ■ 

The equality (i) follows from the identity (a ® b ® c, x ® y ® z) = (a, x)(b, y)(c, z) for a, b, c, x, y, z 
of commensurate dimensions. The equality (ii) follows from the definition u\. in Lemma 


4.1 


□ 


It follows immediately from Lemm aj4.2| that in (21), for each k = 1, ..., K — 1, (•), (’) 

are in fact, separable so that Algorithm [4] is applicable. Recovering the contractions involves solving 
a set of nuclear norm minimization sub-problems for each k = 1,..., K — 1: 

\\Zi\L 


minimize 

Zi 


subject to 


(k) 

y\ = 


{^i\Zi) 


<r S,^) 


(22) 


minimize 

z 2 


subject to 


11 Z2 11 H 


(k) 

y\ = 


(r [ k \z 2 ) 


<r £lz 2 ) 


(23) 


We have the following lemma concerning the solutions of these optimization problems: 


Lemma 4.3. Suppose mk > 3 r(nfc + n^+i — r). Then the unique solutions to problems (22) and 
(23) are XK<$ Bk and Xjf, „ Dk respectively with high probability. 


The proof is analogous to that of Lemma 3.5 


We have the following theorem concerning the performance of Algorithm [4j 
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Theorem 4.4. Let X E M ni x "' xn K y e an unknown tensor of interest with rank r < min {ni,..., uk}- 
Suppose mk > 2>r(rik+nk+i —r) for each k = 1,..., K — 1. TTien f/ie procedure outlined in Algorithm 
[7] succeeds in exactly recovering X and its low rank decomposition with high probability. 

The proof parallels that of the proof of Theorem |3.3| and is omitted for the sake of brevity. 

Remarks. 

• Note that the overall sample complexity is ‘2j2k=i m k> Le. ; ^J2k=i r ( n k + n^+i — r). This 
constitutes an order optimal sample complexity because a tensor of rank r and order K of these 
dimensions has r^)k=l n k degrees of freedom. In particular, when the tensor is “square” i.e., 
n± = ■ ■ ■ = uk = n the number of degrees of freedom is Ixnr whereas the achieved sample 
complexity is no larger than 12 Knr, i.e., O(Knr). 

• As with the third order case, the algorithm is tractable. The main operations involve solving a 
set of matrix nuclear norm minimization problems (i.e. convex programs), computing eigen¬ 
vectors, and aligning them. All of these are routine, efficiently solvable steps (thus “polynomial 
time”) and indeed enables our algorithm to be scalable. 

4.2 Tensor Completion 

The method described in Section [3] for tensor completion can also be extended to higher order 
tensors in a straightforward way. Consider a tensor X E xn# Q f order K and dimensions 

n\ x ■ ■ ■ x uk- Let the rank of this tensor be r < min {n\ ,..., nx} and be given by the decomposition: 

x =® • • ■ ® ^ = it, ® 

i=i i=i p =i 

where uf E M np . Extending the sampling notation for tensor completion from the third order case, 
we define kl to be the set of indices corresponding to the observed entries of the unknown low rank 
tensor X, and define: 


n (fc). = 5(fc) nfi m (k) | Q (fc)| ; 


where S^ E S k . Akin to the third-order case, along each pair of consecutive modes, we will need 
samples from two distinguished slices. We denote the index set of these distinct slices by S{ ' and 
S^'- 1 , the corresponding slices by X k and X k , the index set of the samples revealed from these slices 
by and and their cardinality by and ni^\ 

(k) 

It is a straightforward exercise to argue that observations obtained from each slice S) , i = 1, 2 
correspond to separable measurements, so that Algorithm |4] applies. The first step of Algorithm 
[4] involves solving a set of nuclear norm minimization sub-problems (two problems for each k = 
1,..., K — 1) to recover the slices: 


minimize ll-^ill* 

z 1 

minimize ll^ll* 

Z2 


subject to X Q (k) = [Zi ] n ( fe) 

subject to X Q (k) = [Z 2 ] n ( fe ) 


(24) 

(25) 
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Each of these optimization problems will succeed in recovering the corresponding slices provided 
incoherence, non-degeneracy and the slice conditions hold (note that these notions all extend to the 
higher order setting in a transparent manner). Under these assumptions, if the entries from each 
slice are uniformly randomly sampled with cardinality at least rn^ > C(n k + n k+ i) log 2 (nfc + i), 
i = 1,2 for some constant C , then the unique solutions to problems (22) and (23) will be and 
Xlf respectively with high probability. 

Once the slices X\ and are recovered correctly using (22), (23) for each k = 1 ,,K — 1 
one can compute M\ := X\ (X^Y and M 2 := (XlfY X± and perform eigen-decompositions to 
obtain the factors (up to possible rescaling) {rtf} and {uf +1 }. Finally, once the tensor factors are 
recovered, the tensor itself can be recovered exactly by solving a system of linear equations. These 
observations can be summarized by the following theorem: 


Theorem 4.5. Let X E M niX "' xn rf an unknown tensor of interest with rankr < min{m,... ,nx} 
Suppose we obtain rri\' > and random samples from each of the two distinct mode k slices for 
each k = 1,..., K — 1. Furthermore suppose the tensor X is incoherent, satisfies the slice conditions 
for each mode, and the slices from which samples are obtained satisfy non-degeneracy and pairwise 
genericity for each mode. Then there exists a constant C such that if 

mf ] > C(n k + n k + 1 ) log 2 (n fe+ i) i E {1, 2} 


the procedure outlined in Algorithm [7] succeeds in exactly recovering X and its low rank decompo¬ 
sition with high probability. 

We finally remark that the resulting sample complexity of the entire algorithm is 'Yh k =\ ( m \ k> + m^), 
which is 0(I\rnK log 2 (n/<)). 


5 Experiments 

In this section we present numerical evidence in support of our algorithm. We conduct experiments 
involving (suitably) random low-rank target tensors and their recovery from (a) Separable Random 
Projections and (b) Tensor Completion. We obtain phase transition plots for the same, and compare 
our performance to that obtained from the matrix-unfolding based approach proposed in m- For 
the phase transition plots, we implemented matrix completion using the method proposed in [37] . 
since the SDP approach for exact matrix completion of unfolded tensors was found to be impractical 
for even moderate-sized problems. 

5.1 Separable Random Projections : Phase Transition 

In this section, we run experiments comparing T-ReCs to tensor recovery methods based on “ma- 
tricizing” the tensor via unfolding [Sol. We consider a tensor of size 30 x 30 X 30 whose factors 
U,V,W E M nxr are i.i.d standard Gaussian entries. We vary the rank r from 2 to 10, and look to 
recover these tensors from different number of measurements m E [2, 20 \*n. For each (r, n) pair, we 
repeated the experiment 10 times, and consider recovery a “success” if the MSE is less than 10~' 5 . 
Figure [l] shows that the number of measurements needed for accurate tensor recovery is typically 
less in our method, compared to the ones where the entire tensor is converted to a matrix for low 
rank recovery. 
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(a) Tensor recovery using T-ReCs. (n=30) 


(b) Tensor recovery using [35]. (n=30) 


Figure 1: Phase transition diagram for tensor recovery using our method. White indicates a 
probability of recovery of 1, while black indicates failure of exact recovery. Note that in the matrix 
unfolding case, one requires more measurements compared to our method to achieve the same 
probability of recovery for a given rank. 


5.2 Tensor Completion: Phase Transition 

We again considered tensors of size 30 x 30 x 30, varied the rank of the tensors from 2 to 10, and 
obtained random measurements from four slices (without loss of generality we may assume they are 
the first 2 slices across modes 1 and 2). The number of measurements obtained varied as n X [2, 20]. 
Figure [572] shows the phase transition plots of our method. We deem the method to be a “success” 
if the MSE of the recovered tensor is less than 10 -5 . Results were averaged over 10 independent 
trials. 


5.3 Speed Comparisons 


We finally compared the time taken to recover an n x n x n tensor of rank 3. Figure 3(a) shows 
that, T-ReCs with four smaller nuclear norm minimizations is far more scalable computationally 
as compared to the method of unfolding the tensor to a large matrix and then solving a single 
nuclear norm minimization program. This follows since matricizing the tensor involves solving for 
an n 2 x n matrix. Our method can thus be used for tensors that are orders of magnitude larger 
than competing methods. 

Along lines similar to the recovery case, we compared execution times to complete a 35 x 35 x 
35 sized tensor. Figure |3(b)| shows again that the matrix completion approach takes orders of 
magnitude more time than that taken by our method. We average the results over 10 independent 
trials, and set r = S, m = 3 nr 
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(a) Phase transition for tensor com- (b) Phase transition for tensor com¬ 
pletion using T-ReCs. (n=30) pletion using [35,. (n=30) 


Figure 2: Phase transition plots for tensor recovery. Results are averaged over 10 independent 
trials. White indicates success whereas black indicates failure. 



(a) Time taken to recover third order ten¬ 
sors. The numbers 5 and 10 in the legend 
refer to the cases where we obtain 5 n and 
lOn measurements respectively 
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(b) Time taken for tensor completion by our 
method (T-ReCs) to that of flattening the 
tensor (Matrix Unfolding). 


6 Conclusion and Future Directions 

We introduced a computational framework for exact recovery of low rank tensors. A new class of 
measurements, known as separable measurements was defined, and sensing mechanisms pf practical 
interest such as random projections and tensor completion with samples restricted to a few slices 
were shown to fit into the separable framework. Our algorithm, known as T-ReCs, built on the 
classical Leurgans’ algorithm for tensor decomposition, was shown to be computationally efficient, 
and enjoy almost optimal sample complexity guarantees in both the random projection and the 
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completion settings. A number of interesting avenues for further research follow naturally as a 
consequence of this work: 

1. Robustness: Our algorithm has been analyzed in the context of noiseless measurements. 
It would be interesting to study variations of the approach and the resulting performance 
guarantees in the case when measurements are noisy, in the spirit of the matrix completion 
literature HU- 

2. Non-separable measurements: Our approach relies fundamentally on the measurements 
being separable. Tensor inverse problems, such as tensor completion in the setting when 
samples are obtained randomly and uniformly from the tensor do not fit into the separa¬ 
ble framework. Algorithmic approaches for non-separable measurements thus remains an 
important avenue for further research. 

3. Tensors of intermediate rank: Unlike matrices, the rank of a tensor can be larger than 
its (largest) dimension, and indeed increase polynomially in the dimension. The approach 
described in this paper addresses inverse problems where the rank is smaller than the di¬ 
mension (low-rank setting). Extending these methods to the intermediate rank setting is an 
interesting and challenging direction for future work. 

4. Methods for tensor regularization: Tensor inverse problems present an interesting di¬ 
chotomy with regards to rank regularization. On the one hand, there is no known natural 
and tractable rank-regularizer (unlike the matrix case, the nuclear norm is not known to 
be tractable to compute). While various relaxations for the same have been proposed, the 
resulting approaches (while polynomial time), are neither scalable nor known to enjoy strong 
sample complexity guarantees. On the other hand, matrix nuclear norm has been used in the 
past in conjunction with matrix unfolding, but the resulting sample complexity performance 
is known to be weak. Our work establishes a third approach, we bypass the need for unfold¬ 
ing and expensive regularization, yet achieve almost optimal sample complexity guarantees 
and a computational approach that is also far more scalable. However, the method applies 
only for the case of separable measurements. This raises interesting questions regarding the 
need/relevance for tensor regularizes, and the possibility to bypass them altogether. 
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